Objects, Data & AI: Build thought process to develop Enterprise Applications
allows the model to have access to a wide array of information, including plot summaries, analyses, reviews,
and discussions about earlier Batman movies.
In the period of 2022-23, numerous Large Language models (LLMs) stamped their arrival. Microsoft took
a stake in OpenAI’s GPT model and brought it on Azure platform, thus making it accessible for enterprise
use. Parallelly, Google has been working on Vertex AI, Amazon has been working on AWS Bedrock, Meta
(previously Facebook) release an open source LLM, Llama.
With every passing day, as the book is being compiled in year 2023, Gen AI models’ capabilities are
growing exponentially, and enterprises are harnessing its power in numerous ways. Keeping aligned with
our goal to understand the convergence of Object, Data and AI with enterprise applications, let us try to
understand how Gen AI’s capabilities can be utilized directly by software developers.
Before getting into the details, let us formally define what a Large Language Model is.
14.4.1. Large Language Models (LLMs)
A large language model (LLM) is a type of artificial intelligence (AI) model that has been trained on a
massive dataset of text and code and is designed to understand, generate, and process human language.
Large language models have been trained on trillions of words over many weeks and months, and with
large amounts of compute power. Inspired from transformers-based architecture, LLMs are trained to learn
patterns, grammar, syntax, and semantics of human languages. These models are capable of performing a
variety of language-related tasks, including text generation, translation, summarization, sentiment analysis,
and more. These foundation models with billions of parameters, exhibit emergent properties beyond
language alone, and data scientists are unlocking their ability to break down complex tasks, reason, and
problem solve.
LLMs are built with a significant number of parameters, often ranging from hundreds of millions to billions.
The sheer scale of these models contributes to their ability to understand and generate human-like text. For
example, the GPT-3 model has 175 billion parameters, which is more than 100 times the number of
parameters in a typical machine learning model. Some other popular LLMs are BERT, BLOOM, LLaMa,
PaLM, etc. These foundation models, also referred as base models, differ in size on the basis of parameters.
Parameters can be thought of as memory of a model, meaning, more parameters a model has, better would
be its task performing ability.
These models undergo two main phases: pretraining and fine-tuning. During pre-training, models are
trained on a diverse range of text data that empowers them with the deep statistical representation of
language. During this phase, LLMs learn from huge amount of unstructured textual data, ranging range
from gigabytes to petabytes, and consists of variety of sources including internet scrapes and specially
curated corpora (a selected and organized collections of textual data that have been assembled and
processed for specific purposes, such as training language models or conducting research in natural
language processing). This self-supervised learning step is where the model internalizes the intricate
patterns and structures inherent in language. We have already discussed about training of a neural network
in earlier chapter, hence, I shall not be going into the details
While pre-training helps in predicting the next tokens or words, it is not same as following commands.
Fine-tuning involves training the model on specific tasks or datasets to adapt it to perform those tasks
effectively. The GPT-3 model was trained on a dataset of text and code that was over 500GB in size. By
either using these models in their original form or by applying fine tuning techniques to adapt them to an
enterprise specific use case, we can rapidly build customized solutions without the need to train a new
model from scratch.